T 


s 


W1C  File  CQjtj 

AD-A181  406  3RT  DOCUMENTATION  PAGE _ 


& 


UhCLASblucu 


3*  SECURITY  CLASSIFICATION  AUT 


as.  OCCLASSlFlCATION/OOWNGftA 


«  performing  organization 


t*.  NAMi  OF  PERFORMING  ORGANIZATION 

Northwestern  University 


fcc  aooaess  (Cur.  two  ■»<  zir  Codti 
619  Clark  Street 
Evanston,  IL  60201 


rib.  aestaictive  markings 


3  distaisution/availasilitv  of  report 

Approved  for  public  release;  distribution 
unlimited. 


S.  MONITORING  ORGANIZATION  REPORT  NUMRERIS) 

AFOSR-TB*  8  7-079 


7*.  NAMt  OF  MONITORING  ORGANIZATION 

Air  Force  Office  of  Scientific  Research 


7b.  ADDRESS  (City.  Sts*  sod  ZIP  Code  I 

Directorate  of  Mathematical  &  Information 
Sciences,  Bolling  AFB  DC  20332-6448 


8a  name  OF  FUNOING/SPONSOP^NG 

Sb.  OFFICE  SYMBOL 

ORGANIZATION 

(If  sorties**  1 

AFOSR 

NM  * 

ADDRESS  (City,  Sts *  md  ZIP  Cods I 


tamo 


10.  SOURCE  OF  FUNDING  NOS. 


PROGRAM 
I  ELEMENT  NO.  I  NO. 


Bolling  AFB  DC  20332-6448 _ [ 

It.  TITLE  llnelode  Security  Ciseeifieshooi  ' 

M3NTE  CARLO  RELIABILITY  ANALYSIS  (Unclassified' 

12.  PERSONAL  AUTHOR®) 

Prof.  E.  E.  Lewis  _  "  - 

12a  TYRE  OF  REPORT  '”l  12b.  TIME  COVERED 

annual  Technical  -from  9/1/85  to  8/31/8( 

IS.  SUPPLE  ME  NT  ART  NOTATION 


PROJECT 

TASK 

NO. 

NO. 

2304 

C3 _ 

WORK  UNIT 
NO. 


61102F 


I2tx  TIME  COVERED  14.  DATE  OF  REPORT  fYr..  Mo..  Doy> 

■  FROM  9/1/85  TO  8/31/8*  12/1/86  _ _ 


|1S.  PAGE  COUNT 


•K  r.v 


*1-  ■ 


17 

COSAT  1 

COOES 

field 

GROUP 

SUB.  GR.  i 

IS  SUBJECT  TERMS  iCootioue  os  reverse  if  neeetesry  sod  identify  by  Mock  numderi 

Reliability,  Monte  Carlo,  Simulation 


IB.  ABSTRACT  (CmImm  m  reverse  if  neeeesmiy  sod  identify  try  Mock  number; 

^ The  work  carried  out  during  the  1985/86  contract  year  is  summarized: 

Markov  Monte  Carlo  methods  are  generalized  to  include  inhomogeneous  Markov  processes. 
Two  new  sampling  techniques  allow  the  treatment  of  reliability  problems  which  include 
time-dependent  failure  rates  and  preventive  maintenance.  Incorporation  of  periodic 
testing  and  repair  allows  classes  of  revealed  and  unrevealed  failures  to  be  combined 
in  problems  with  wear,  periodic  maintenance  and  component  dependencies. 


!*.»►.  p-  ..t/—  •- 


20.  OltTRIBUTION/AVAILABlLITY  OF  ABSTRACT  21.  ABSTRACT  SECU 

UNCLASSIPISD/UNL1MITIO  K)  SAME  AS  RPT.  □  OTIC  USERS  □  UNCLASSIFIED 
22a  NAME  OF  RESPONSIBLE  INDIVIDUAL  22b.  TELEPHONE  NL 


21.  ABSTRACT  SECURITY  CLASSIFICATION 


22b.  TELEPHONE  NUMBER 
(Include  Ante  Code) 

(202)  767-  Ah. 


IBER 


DD  FORM  1473,  83  APR 


EDITION  OF  1  JAN  73  IS  OSSOLETE. 


33c.  OFFICE  SYMBOL 

I 

NM 

UNCLASSIFIED 


SECUAITY  CLASSIFICATION  OF  THIS  PAOE 


APOSK -7*. 


8  7‘  0  790 


1985/86  Activity 

During  the  1985/86  contract  period  research  has  been  concentrated  on  the 
use  of  Inhomogeneous  Markov  processes  to  treat  time-dependent  failure  rates 
and  preventive  maintenance,  and  on  the  effective  Monte  Carlo  simulation  of  the 
resulting  models.  This  work  is  summarized  in  the  enclosed  paper  which  was 
recently  published  in  Reliability  Engineering. 

The  principal  investigator  was  assisted  by  Mr.  Z.  Tu  (not  supported  by 
AFOSR)  in  the  work  on  Inhomogeneous  Markov  processes.  In  addition,  the 
contract  supported  a  Ph.D.  student,  Mr.  F.  Boehm.  During  the  contract  period 
Mr.  Boehm  began  to  examine  two  related  aspects  of  Monte  Carlo  simulation. 
First,  he  examined  the  departures  from  the  Markov  condition  that  are  needed  to 
study  parts  replacement  policies.  This  work  is  in  an  active  state  of 
development,  with  computer  simulations  being  carried  out  presently.  It  will 
be  reported  at  the  end  of  the  1986/87  contract  year. 

In  addition,  Mr.  Boehm  is  examining  more  closely  the  sources  of  data  for 
wear  phenomena  that  give  rise  to  failure  rates  that  increase  with  time;  his 
emphasis  is  on  the  incorporation  of  more  realistic  fatigue  failure  models  into 
the  simulation  of  mechanical  components. 

He  are  Interested  in  applying  our  simulation  methods  to  problems  of 
active  interest  to  the  Air  Force.  To  this  end  the  principal  investigator  has 
arranged  a  trip  to  Systems  Reliability  and  Engineering  Division  at  the  Rome 
Air  Development  Center,  Griffin  AFB,  NY.  to  determine  how  our  methods  might 
interface  the  ORACLE  code  system. 


.■  •.  >v  i  ... 

...  •  ■  ...  j. 


87  6  io 


228 


Reliability  Engineering  16  (1986)  277-296 


Monte  Carlo  Reliability  Modeling  by  Inhomogeneous 
Markov  Processes 


E.  E.  Lewis  and  Tu  Zhuguo* 

Department  of  Mechanical  and  Nuclear  Engineering,  Northwestern  University, 
Evanston,  Illinois  60201,  USA 

(Received:  9  April  1986) 


ABSTRACT 

Markov  Monte  Carlo  methods  for  reliability  calculations  are  generalized 
to  include  inhomogeneous  Markov  processes.  Two  new  sampling  tech¬ 
niques  allow  the  treatment  of  time-dependent  failure  rates  and  of 
preventive  maintenance.  Incorporation  of  periodic  testing  and  repair 
models  allows  classes  of  revealed  and  unrevealed failures  to  be  combined  in 
problems  with  wear,  periodic  maintenance  and  component  dependencies. 
Numerical  illustrations  of  these  phenomena  are  presented  for  one,  two 
and  multiple  component  systems. 


1  INTRODUCTION 

In  previous  papers1,2  the  Lagrangian  approach  of  Markov  Monte  Carlo 
methods  has  been  shown  to  be  very  effective  for  estimating  the  reliability 
and  availability  of  complex  systems.  The  ability  to  treat  general 
component  dependencies  in  multicomponent  systems,  coupled  with  the 
use  of  variance  reduction  techniques  to  greatly  increase  sampling 
efficiency,  results  in  highly  efficient  algorithms,  capable  of  treating 
Markov  models  that  would  be  intractable  by  deterministic  computational 
methods.  There  are,  however,  two  major  limitations  on  the  ability  of 

•Permanent  address:  Beijing  Nuclear  Engineering  Institute,  PO  Box  840,  Beijing, 
People’s  Republic  of  China. 

277 

Reliability  Engineering  0143-81 74/86/S03  50  CC  Elsevier  Applied  Science  Publishers  Ltd. 
England.  1986.  Printed  in  Great  Britain 


278 


E.  E.  Lewis.  Tu  Zhuguo 


the  foregoing  Monte  Carlo  methods  in  the  faithful  modeling  of  reliability 
problems:  they  are  limited  to  constant  failure  rates  and  to  revealed 
failures.  In  order  to  model  component  aging  or  wear,  and  the  con¬ 
comitant  effects  of  preventive  maintenance,  the  formulation  must  be 
generalized  to  include  time-dependent  failure  rates.  If  failures  are 
unrevealed,  then  periodic  testing  also  must  be  included  in  the  modeling. 

A  perfectly  general  treatment  of  wear,  maintenance  and  repair 
phenomena  would  require  a  number  of  departures  from  the  conditions 
that  define  Markov  processes.  However,  a  number  of  the  more  important 
of  these  phenomena  can  be  modeled  by  generalizing  the  existing  Monte 
Carlo  methods,  which  are  limited  to  homogeneous  Markov  processes, 
to  treat  inhomogeneous  Markov  processes.  For  when  the  resulting  time- 
dependmt  transition  rates  are  combined  with  the  ability  to  make 
deterministic  state  transitions  at  specified  time  intervals,  the  resulting 
simulation  can  treat  wear,  preventive  maintenance  at  specified  times, 
and  reasonable  approximations  to  the  repaii  of  several  classes  of  revealed 
and  unrevealed  failures. 

In  this  paper  we  formulate  the  required  equations  and  present  two 
methods  for  transition  sampling  in  the  presence  of  time-dependent 
transition  rates:  self-transitions  and  mode  sampling.  The  procedures  are 
then  generalized  to  allow  for  periodic  testing  and  repair.  Finally,  we 
examine  a  number  of  problems  for  one,  two  and  multicomponent 
systems  in  which  component  dependencies  of  the  load  sharing  variety 
are  present.  For  such  systems  the  ability  of  inhomogeneous  Markov 
Monte  Carlo  simulation  to  treat  wear,  preventive  maintenances  and 
combinations  of  revealed  and  unrevealed  failures  is  thus  demonstrated. 


2  THE  INHOMOGENEOUS  MARKOV  EQUATIONS 

In  order  to  treat  reliability  for  systems  in  which  component  wear  and/or 
early  failures  are  present,  we  must  represent  the  time  dependence  of 
failure  rates  of  real  components.  These  typically  are  represented  in  the 
form  of  ‘bathtub’  curves,  such  as  that  illustrated  in  Fig.  1 .  As  in  earlier 
work,  however,  we  also  must  be  able  to  represent  dependencies  between 
component  failures,  such  as  occur,  for  example,  with  shared  loads, 
shared  repair  crews  and  standby  configurations.  To  incorporate  time- 
dependent  failure  rates  into  a  system  with  component  failure  depend¬ 
encies  we  generalize  our  earlier  homogeneous  Markov  formalism  into 


Markov  Monte  Carlo  methods 


27V 


Rg.  I-  Failure  rate  curve  exhibiting  ‘bathtub*  behavior. 

inhomogeneous  Markov  formalism:  one  in  which  the  transition  rates 
between  states  are  explicit  functions  of  time. 

To  begin,  assume  that  we  have  a  system  with  N  components,  each  of 
which  may  be  in  an  operational  or  a  failed  state.  There  are  then  2N 
states  corresponding  to  the  unique  combinations  of  operational  and 
failed  components.  We  let  pk(()  =  probability  that  the  system  is  in  state 
k  at  time  t,  and  therefore 


k 


We  designate  k  =  0  as  the  initial  state  in  which  all  components  are 
operational,  and  then 

P*(0)  —  Si0 

The  equations  for  the  continuous  time  Markov  process  governing  the 
probabilities  pk(t)  are 

^  Pk(0  =  +  (1) 

k  k 

where  the  y k  k(i)  are  the  transition  rates  between  states.  More  precisely, 
the  system  is  defined  as  a  semi-Markov  process.-'  Each  transition  leads 
to  a  state  change.  Thus, 

YjO  *=  0 

and  the  summations  do  not  include  the  k'  =  k  terms. 

It  is  convenient  to  rewrite  eqn  (I)  as 


280 


£.  £.  Lewis,  Tu  Zhuguo 


where 


and 


y*(f)  = 


Z,“<" 

k*k 


q(k\k',t)  =  ykk.(t)/yk.(t) 


The  quantity  q(k  |  k',  t)  is  readily  seen  to  be  the  conditional  probability 
that  given  a  transition  out  of  state  k'  at  time  /,  the  new  state  will  be  k. 

In  general,  yk(t),  the  transition  rate  out  of  state  k,  can  be  represented 
as 


Pn(0 


where  klk(i)  and  plk(t)  are  the  failure  and  repair  rate  of  component  /, 
and  Ok  and  Fk  are  the  sets  of  operational  and  failed  components, 
respectively,  in  state  k.  In  the  case  where  component  dependencies  are 
present,  A,(/)  and/or  p,  and/or  p,(i)  will  also  depend  on  the  system  state 
k.  For  example,  in  a  standby  configuration  the  failure  rate  of  the  backup 
unit  will  depend  strongly  on  whether  the  primary  unit  is  operational. 

The  equation  for  pk(t)  may  be  put  in  integral  form.  By  using  an 
integrating  factor 


’{'I 


expi  -  y»(Odr' 


we  obtain 


Pk<0  =  <St0exp£  -  J  >’0(t’)df'J  +  J"  di'exp 
■  }>,  i')vk(t')pk\t') 


y*(f")dt" 


To  express  this  equation  in  terms  of  the  probability  distributions  sampled 
in  the  Monte  Carlo  simulations,  we  introduce 


*»(')  s  }\lOpk(t) 


Markov  Monte  Carlo  methods 


281 


which  is  the  probability  density  of  transitions  out  of  state  k  and  multiply 
by  yk(t)  to  obtain 


X*(0  =  £  dr'/(t  |  t',k')  5k0  m  +  ^  q(k  I *'•  t,)x“  (n  (2) 


where 


fit  1 1\  k')  =  yk.(t)  exp  <  -  yt(f")  dr"  >  f  ^  f' 


(3) 


is  the  probability  density  that  there  will  be  a  transition  at  t  given  that 
the  system  is  in  state  k'  at  t' . 

We  may  also  write  this  equation  in  the  notation  of  an  earlier  paper,1 
by  noting  that 


<M0  =  <*i,o<5(f)  + 


^q{k\k\t)xk(t) 


t  ^  0 


(4) 


is  the  probability  density  for  transitions  into  state  k.  The  first  term  on 
the  right  is  due  to  the  convention  that  the  problem  is  initialised  by  a 
transition  into  state  k  =  0  at  t  =  0.  Combining  eqns  (2)  and  (4),  we 
obtain 


I 

k  • 


q(k  1  k',  t') 


dt'f(t  |  t',k')il/k.(t') 
o 


3  MONTE  CARLO  SAMPLING 


If  the  transition  rates  are  taken  to  be  time-independent,  the  foregoing 
equations  reduce  to  the  homogeneous  Markov  formulation.  In  that 
case,  eqn  (3)  becomes  the  exponential  distribution  and  the  resulting 
Monte  Carlo  sampling  for  transition  times  is  straightforward.  When 
time-dependent  transition  rates  are  present,  however,  the  direct  inversion 
technique  used  for  the  exponential  distribution1  is  no  longer  applicable. 
To  illustrate,  we  first  find  the  cumulative  distribution  corresponding  to 
eqn  (3)  to  be 


F\l  |  t',k')  =  1  -  exp 


(5) 


2X2 


E.  E.  l^ewis.  Tu  y.huguo 


To  perform  direct  inversion  sampling  we  set  F(t\t',k')  to  a  random 
number  £  uniformly  distributed  between  zero  and  one.  We  obtain 

f  7*(Od/"  =  -ln(l  -S) 


The  difficulty,  of  course,  is  in  inverting  this  expression  for  t.  As  an 
alternative  we  present  two  methods  for  sampling  the  times  between 
transitions:  mode  sampling  and  self-transitions.  Following  the  transition, 
the  sampling  for  the  new  state  k  is  straightforward.  As  in  the  homogene¬ 
ous  Markov  formulation,  we  simply  choose  a  uniformly  distributed 
random  number,  say  £’,  and  then  choose  the  state  which  satisfied  the 
condition 


where 


it  + 1 


^  q(k"  |  k\  t)  <  £'  ^  ^  #"  I  0 


k’  -  O 


(6) 


q(k’\k\l)  =  0 

and  i  is  taken  at  the  time  of  transition. 

Mode  sampling 

Supnose  that  the  transition  rate  can  be  written  as  the  sum  of  a  number 
of  t. ansition  modes 


Each  mode  is  represented  by  a  two-parameter  Weibull  distribution  with 
a  different  exponent.  Thus  we  may  write 

f' 

JO 

If  we  allow  several  different  values  of  to  appear  in  such  series,  we 
can  reasonably  represent  most  time-dependent  failure  rates  of  interest. 
For  example,  the  bathtub  curve  such  as  that  shown  in  Fig.  1  may  be 
approximated  as  the  superposition  of  three  such  terms  with  m,  <  1, 


Markov  Monte  (  ario  methods 


2KX 


m1  ~  I  and  ml  >  I.  These  correspond  to  early  failures,  random  failures 
and  aging  failures,  respectively. 

For  a  multicomponent  system  all  components  are  assumed  to  contain 
the  same  values  of  m,.  Then  the  transition  rate  given  by  eqn  (7)  is 
written  as 


where 


o::  YjO?'+Sm-xYj 

ltOl  '  IfF. 


Combining  eqns  (5)  and  (8),  we  may  write  the  cumulative  distributions 
as 


F(t  1 1\  k)  =  I  - 


-u/okp  +  u  '/(>„)”•] 


This  may  be  shown  to  be  a  minimum  extreme  value  distribution4  where 
the  parent  distributions  are 


FM I  =  1  -  exp  {  -( tjOkp  +  U'/Okp}  (9) 

Thus  we  sample  each  of  the  Ft  for  the  time  of  the  next  transition  by 
mode  /,  and  take  the  minimum  value.  Hence,  for  mode  i  we  choose  a 
uniformly  distributed  random  number  4,.  and  set  it  equal  to  F  The 
inversion  of  eqn  (9)  leads  to 

The  transition  is  then  taken  at 


f  =  min  f, 

i 


Self-transitions 


In  this  method  we  subtract  and  add  the  term  ykk(i)pk(t)  to  the  right  of 
eqn  (1).  This  has  no  effect  on  the  solution.  However,  the  term  k'  =  k  is 
now  included  in  the  sums  appearing  in  eqn  (1)  and  in  all  succeeding 
equations.  We  refer  to  these  as  self-transitions  because  they  represent 
transitions  back  into  the  same  state  from  which  the  transition  originated. 


284 


E.  E.  Lewis,  Tu  Zhuguo 


In  effect  we  have  transformed  the  equations  from  a  semi-Markov  to 
a  Markov  formulation 
Now  suppose  we  choose  yki(t)  such  that 


y«(f) ~  'A 


k'*k 


where  y”  is  a  non-negative  constant.  We  thus  have 

>’k<f)  =  Vk° 

and  the  modified  equations  become 

^  Pk( ')  =  -  )tPjO  +  ^  <l(k  I  t )y°pk.(t) 

k- 


This  transformation  enables  us  to  write  the  succeeding  equation  in  terms 
of  the  exponential  probability  density 

/(r  |  r\fc)  =  y°e 

which  may  be  sampled  using  a  single  random  number 
Z  =  F(t\t',k')=  I -e-1*"-"  t  >t‘ 
to  obtain  the  time  of  the  next  transition  as 

>’k 

To  determine  the  new  state  of  the  system,  we  again  use  eqn  (6).  Now. 
however,  with 

The  diagonal  term  q(k'  |  k'.r)  now  is  greater  than  zero.  If  the  transition 
k'  -*k‘  is  sampled,  then  the  system  remains  in  the  same  state  at  t  and 
the  calculation  continues.  Otherwise,  a  new  stale  k  is  obtained. 


4  COMPONENT  MODELING 

With  either  of  the  sampling  methods  discussed  in  the  preceding  section 
we  may  treat  wear,  preventive  maintenance  and  repair,  provided  we  are 
able  to  model  them  within  the  framework  of  inhomogeneous  Markov 


Markov  Monte  Carlo  methods  2X5 

processes.  We  first  consider  wear  and  then  preventive  maintenance  for 
situations  where  there  is  no  repair.  We  then  consider  repair,  first  of 
revealed  and  then  of  unrevealed  failures.  A  revealed  failure  is  one  that 
is  known  immediately;  the  modeling  of  repair  is  through  an  exponential 
distribution  of  times  to  repair.  An  unrevealed  failure  is  one  that  remains 
in  effect  until  the  system  is  tested  for  failure  and  repaired  at  some 
predetermined  time  intervals.  We  illustrate  each  of  these  models  for  a 
one-component  system.  In  most  cases,  the  generalization  to  multi- 
component  systems,  and  to  systems  with  component  dependencies,  is 
treated  analogous  to  that  discussed  previously1-2  for  homogeneous 
Markov  processes. 

Wear  and  preventive  maintenance 

In  the  case  of  wear  without  preventive  maintenance,  the  failure  rate 
curve  is  likely  to  appear  as  in  the  solid  line  on  Fig.  2.  To  estimate 
the  reliability  of  the  same  component  in  the  case  where  preventive 
maintenance  is  performed  at  intervals  T.  we  assume  that  the  component 
is  restored  to  an  as-good-as-new  condition.  The  failure  rate  is  thus  given 
by 

;*(/)  =  Mt  -  AT)  NT^t^(N+\)T 

The  failure  rate  thus  has  a  periodicity  as  indicated  by  the  dashed  line 
in  Fig.  2. 

To  treat  the  periodicity  of  preventive  maintenance,  mode  sampling 
must  be  applied  one  interval  at  a  time  in  the  following  manner.  The 
distribution  is  sampled  to  determine  whether  a  failure  takes  place  in  the 
interval  O^t^T.  If  it  does  not,  then  time  is  set  equal  to  T,  and 
preventive  maintenance  is  assumed  to  take  place.  Then  mode  sampling 


Fi*.  2.  Failure  rale  curve  illustrating  wear  and  periodic  maintenance 


:&6 


K  E  l^ewis.  Tu  '/.huguo 


is  used  to  determine  if  a  failure  takes  place  in  the  interval  T<l^2T 
and  so  on.  We  hereafter  refer  to  mode  sampling  as  method  a. 

The  self-transition  method  may  be  performed  in  a  similar  manner. 
Sampling  is  first  carried  out  to  determine  whether  failure  takes  place  in 
the  interval  0  <  /  <  T.  If  it  does  not,  time  is  set  equal  to  T  and  the 
interval  T^t^lT  is  considered.  We  refer  to  this  as  method  b.  Alter¬ 
nately,  we  may  simply  apply  self-transition  to  the  entire  problem  domain. 
The  exponential  is  sampled  to  determine  the  next  transition  regardless 
of  the  interval  into  which  it  falls.  This  is  referred  to  as  method  £. 

Revealed  failures 

If  the  component  fails  at  time  tf.  and  the  failure  is  revealed  immediately, 
then  a  distribution  of  repair  times  will  be  sampled  to  determine  the  time 
of  repair  The  question  must  then  be  decided  as  to  what  component 
failure  rate  should  be  used  following  repair.  There  are  three  obvious 
models,  all  of  which  reduce  to  the  standard  revealed  failure  model  for 
the  time-independent  failure  rates  of  homogeneous  Markov  processes. 

If  the  repair  of  a  revealed  failure  to  an  as-good-as-new  state,  then, 
following  the  failure,  the  failure  rate  is  set  back  to  the  time  zero  value; 
this  model  is  indicated  by  line  a  in  Fig.  3.  Secondly,  if  the  repair  is  made 
to  an  as-good-as-old  state,  then  the  failure  rate  is  set  back  to  the  time 
at  which  the  failure  took  place.  This  model,  indicated  by  lineh.  assumes 
that  no  additional  wear  occurred  during  the  downtime  for  repair. 
Finally,  in  the  continuous  wear  model  we  assume  that  the  original 
failure  rate  curve,  indicated  by  linec,  is  followed,  thus  assuming  that 
the  wear  process  continues  unabated  through  the  repair  interval. 

Of  the  three  models  for  revealed  failures,  only  that  for  continuous 


Time  t  — * 

Fig.  3.  Failure  rate  curves  showing  three  models  for  repair  of  revealed  failures:  (a)  as- 
good-as-new,  (b)  as-good-as-old;  (c)  continuous  wear. 


Markov  Monte  Carlo  methods 


2X7 

wear  falls  within  the  framework  of  inhomogeneous  Markov  processes, 
for  only  it  has  a  component  failure  rate  that  is  independent  of  repair 
history.  However,  since  t,  —  tf ,  the  time  interval  during  which  the  system 
is  in  a  failed  state,  is  normally  short,  the  difference  between  curves  b 
and  c  should  be  small;  it  amounts  only  to  taking  no  credit  for  the  fact 
that  the  aging  is  slightly  reduced  due  to  the  downtime  for  repair.  In  the 
calculations  that  follow  we  model  all  revealed  failure  as  continuous 
wear,  it  being  a  good  but  pessimistic  approximation  to  the  as-good-as- 
old  case.  Failures  may  occur  such  that  scheduled  maintenance  takes 
place  before  repair  is  completed,  as  indicated  by  t'f  and  r'  in  Fig.  3.  In 
such  cases  the  solid  curve  is  used  following  repair.  The  current  model 
neglects  downtime  for  preventive  maintenance. 

Unrevealed  failures 

For  unrevealed  failures  the  component  remains  in  a  failed  state  until 
repair  or  maintenance  takes  place  at  some  predetermined  time  interval 
T.  For  these  failures  the  repair/maintenance  may  also  be  represented 
as-good-as-new,  as-good-as-old  or  continuous  aging  models.  These  are 
represented  respectively  by  curves  a,  b  and  £  in  Fig.  4,  where  it  is  again 
assumed  that  the  failure  occurs  at  if.  In  this  case  the  continuous  aging 
model  represents  a  poor  approximation  to  as-good-as-old  repair.  With 
preventive  maintenance,  however,  restoration  to  an  as-good-as-new  state 
often  offers  a  reasonable  model. 

For  unrevealed  failures  the  as-good-as-new  model,  a,  falls  within  the 
inhomogeneous  Markov  criterion,  provided  we  assume  that  preventive 
maintenance  is  performed  at  each  test  interval  to  return  the  component 
to  an  as-good-as-new  condition.  The  as-good-as-old  criterion,  b,  implies 


Time  f  — •- 

Fig.  4.  Failure  rate  curves  showing  three  models  for  repair  of  unrevealed  failures:  (a) 
as-good-as-new;  (b)  as-good-as-old;  (c)  continuous  wear. 


288 


E.  E.  Lewis.  Tu  Zhuguo 


that  the  testing  and  repair  do  not  effect  wear  mechanisms.  Since  it 
violates  the  Markov  criteria,  it  is  not  employed  in  the  calculations  that 
follow.  While  the  continuous  aging  model  falls  within  the  inhomogeneous 
Markov  framework,  it  is  not  considered  further  for  unrevealed  failures, 
for  with  it  one  must  presume  that  component  age  accumulates  at  the 
same  rate  regardless  of  whether  it  is  in  a  failed  state,  even  for  a  long 
period  of  time.  Hence  in  what  follows  all  tests  for  unrevealed  failures 
are  assumed  to  include  the  maintenance  required  to  return  the  component 
to  an  as-good-as-new  state. 

5  NUMERICAL  RESULTS 

In  this  section  we  first  examine  the  sampling  methods  developed  above, 
along  wtth  the  methods  for  treating  revealed  and  unrevealed  failures, 
in  a  one-component  system.  The  effects  of  wear,  preventive  maintenance 
and  repair  models  are  then  applied  to  a  two-component  system.  Finally, 
a  ten-component  system  is  used  to  demonstrate  the  application  of  the 
Monte  Carlo  models  to  more  realistic  configurations  in  which  wear, 
component  dependencies  and  combinations  of  revealed  and  unrevealed 
failures  are  present.  In  the  calculations  that  follow,  importance  sampling 
in  the  form  of  both  forced  transitions  and  failure  biasing1  is  employed 
to  reduce  variance  and  improve  computational  efficiency.  The  applica¬ 
tion  of  these  variance  reduction  techniques  as  well  as  the  procedures 
for  making  reliability  and  availability  estimates  are  identical  to  those  used 
in  homogeneous  Markov  Monte  Carlo.11  Unless  otherwise  specified,  all 
results  are  based  on  runs  of  10000  histories.  In  no  case  is  more  than  a 
few  seconds  required  on  a  Control  Data  Cyber  205  in  scalar  mode. 

Single  component 

The  sampling  methods  for  the  time  to  transition  are  applied  to  a 
component  with  a  failure  rate  given  by 

/(f)  =  Xo  +  {m/0)U/0r~'  (yC)  (10) 

where  we  use  =  0-0l3/yr~ 0  =  7-5yr  and  m  =  2-5.  In  Table  1  the 
unreliability  A  *  1  —  R  is  given  for  a  5-yr  design  life,  where  N  is  the 
number  of  intervals  into  which  the  design  life  is  divided  for  purposes 
of  performing  as-good-as-new  preventive  maintenance.  Thus  for  N  =  5 
preventive  maintenance  is  performed  annually.  The  reference  results  are 


Markov  Monte  Carlo  methods 


289 


290 


E.  E.  Lewis.  Tu  Zhuguo 


obtained  by  analytic  evaluation  of  the  appropriate  formulae.5  In  the 
case  that  rio  wear  is  present,  X(t)  -*  A0,  and  the  unreliability  is  reduced 
to  0  062932. 

To  compare  the  sampling  methods  we  have  tabulated  the  computing 
time,  r,  the  estimated  root  sample  variance  and  the  widely  used  figure 
of  merit,  1  /(cr2/).  The  quantity  ±o/s/n,  with  N  being  the  number  of 
trials,  is  the  estimated  68%  confidence  interval  that  appears  in  all 
subsequent  tables,  and  the  figure  of  merit  is  a  standard  method  for 
comparing  the  computational  efficiencies  of  alternate  Monte  Carlo 
procedures. 

As  indicated  by  these  and  other  model  problem  results,  the  variant 
of  the  self-transition  method  labeled  model  (b)  is  more  efficient  computa¬ 
tionally  than  variant  (c).  Both  self-transition  methods  are  clearly  superior 
to  mode  sampling  method,  model  (a).  It  should  be  observed  that 
for  multiple-component  systems  mode  sampling  also  becomes  more 
cumbersome,  particularly  for  problems  with  repair  models.  Conversely, 
self-transition  becomes  more  efficient  in  the  presence  of  preventive 
maintenance  since  the  resulting  reduction  in  the  maximum  failure  rate 
over  the  life  of  the  components  reduces  the  fraction  of  self-transition. 
Thus,  models  (a)  and  (h)  are  disregarded  for  subsequent  problems. 

In  Table  2  are  shown  single-component  unavailability  results  for  both 
revealed  and  unrevealed  failures.  The  interval  unavailability  is  tabulated 
for  a  5-yr  design  life.  For  the  revealed  failures  a  repair  rate  of  n  = 
lOyr"1  is  used,  and  wear  is  assumed  to  accumulate  through  the  repair 
period  as  indicated  in  model  £  in  Fig.  2.  For  the  time-independent  failure 
rate  A  =  A0,  the  second  term  is  deleted  from  eqn  (10).  In  the  latter  case, 
the  results  indicate  that  preventive  maintenance  has  no  effect  on  the 

TABLE  2 


Single-component  Interval  Unavailabilities* 


No  maintenance 

Annual  maintenance 

Revealed  failures* 

Constant, 01262  x10  2  ±0-001  3  x  10' 2 
Increasing.  /(/)  0  809  x  10  J  ±  0  01 1  0  x  10" 2 

01262  x  10  2  ±0  001  3  x  10  2 
0  I886x  10  2  ±0002 6  x  10'  2 

Unrevealed  failures 

Constant.  A0t  0  471  7  x  10  '  ±0  0020  x  10  1 

Increasing,  A(/)t  2  145  2  x  10"  '  ±0  0230  x  10' 1 

00187  x  10  1  ±0  0004  x  10  1 
0025  25  x  10  1  ±0  0007  x  10  ' 

*  5-yr  design  life,  t  failure  rates  from  eqn  (10);  J  n  -  10 yr "  '. 


Markov  Monte  Carlo  methods 


291 


unavailability.  When  wear  is  added,  by  using  eqn  (10)  to  represent  the 
time-dependent  failure  rate,  the  unavailability  increases  as  would  be 
expected.  When  annual  as-good-as-new  preventive  maintenance  is 
included  on  an  annual  basis,  N  =  5,  then  the  unavailability  is  reduced 
significantly,  remaining,  of  course,  above  the  value  for  which  no  wear 
is  present. 

For  unrevealed  failures,  the  unavailability  results  are  smallest  for  the 
constant  failure  rate  case,  where  no  wear  is  present.  For  the  case  with 
wear  the  annual  test/repair  is  assumed  to  restore  the  component  to  an 
as-good-as-new  condition  as  in  model  a  of  Fig.  3.  This  modeling  is 
necessary  to  remain  within  the  Markov  framework  as  discussed  in  the 
preceding  section.  As  indicated  by  the  table,  the  annual  test  and  repair 
causes  a  significant  decrease  in  the  unavailability  whether  or  not  wear 
is  present.  For  all  unrevealed  failure  calculations  here  and  in  what 
follows  it  is  assumed  that  the  repair  time  can  be  ignored  compared  to 
the  downtime  in  the  unrevealed  failed  condition. 

Two  components 

We  consider  next  a  simple  active  parallel  system  consisting  of  two 
components  in  order  to  illustrate  component  interactions  in  the  presence 
of  wear,  preventive  maintenance  and  of  shared-load  dependencies.  Each 
component  is  represented  by  a  failure  rate  given  by  eqn  (10).  either  with 
the  last  term  deleted  (the  2  =  A0  time-independent  failure  rate)  or  in  the 
wear  model  with  both  terms  present.  The  failure  and  repair  parameters 
are  those  used  for  the  single-component  system.  Preventive  maintenance, 
where  included,  is  performed  annually  on  a  staggered  basis  for  the 
duration  of  the  5-yr  design  life  (i.e.  maintenance  is  performed  at  1,3 
and  5yrs  on  component  one,  and  at  2  and  4yrs  on  component  two). 
The  models  for  revealed  and  unrevealed  failures  are  the  same  as  above. 

The  unreliabilities  and  unavailabilities  are  given  in  Tables  3  and  4, 
respectively.  In  these  calculations  the  component  failures  are  assumed 
to  be  independent  for  the  time-independent  failure  rate  ;0.  For  the  time- 
dependent  failure  rates  A(r),  both  independent  and  shared  loads  are 
given.  In  the  shared  load  cases  the  dependency  is  modeled  by  assuming 
an  increased  rate  of  wear  when  only  one  component  is  in  operation. 
This  is  accomplished  by  replacing  0  by  O'  =  0  —  AO  in  eqn  (10)  with  AO  ~ 
2  5yr.  In  the  unreliability  calculations  repair  of  the  redundant  compom 
is  allowed  until  system  failure  occurs.  The  increases  in  unreliability  and 


E.  E.  Lewis.  Tu  Zhuguo 


292 


TABLE  3 

Two-componem  System  Unreliability* 


Mo  maintenance 

Maintenance 

Revealed  failures 
Constant,  x0 
Increasing.  2(/) 
(independent) 
Increasing,  Ml) 
(shared  load) 

0  1652  x  I0'j±0  0003  x  I0'3 

0  9290  x  10 " 2  ± 0  0180  x  I0'2 

0-232 0  x  10' 1  ±0  0044  x  10" 1 

0  1652  x  10' 3 ±0-000  3  x  I0'3 
0  065  7  x  10 ' 2  ±  0  0004  x  I0'2 

00123  x  10  *  ±0  0001  x  10  ‘ 

Unrevealed  failures 
Constant.  20 
Increasing,  Mi) 
(independent) 
Increasing.  M') 
(shared  load) 

0-3940  x  10  2  ±0-0022  x  I0'2 

0  I209±  0  00l  2 

0  254  3  ±0  0020 

0-043  2  x  10  2  ±  0  001  7  x  10“ 2 
00736  x  10' 2  ±  0  0046  x  10  2 

0  1262  x  10  2±  0008  1  x  10' 2 

*  5-yr  design  life 

TABLE  4 

Two-component  System  Interval  Unavailability 

No  maintenance 

Maintenance 

Revealed  failures 
Constant. 
Increasing.  MO 
(independent) 
Increasing.  MO 
(shared  load) 

0  1640  x  I0'5±0-001  7  x  10'5 
09177  x  10  *±0  0290  x  I0‘* 

0-242  6  x  10  1  ±  0-0 1 1  0  x  10  3 

0  I640x  10  ’±0  001  7x  10'5 

0  0648  x  10'* ± 0-0009  x  10  * 

00123  x  10  ’±0  0002  x  10  3 

Unrevealed  failures 
Constant.  x0 
Increasing,  x(r) 
(independent) 
Increasing  MO 
(shared  load) 

0  201  Ox  10  2  ±0002  1  x  10  2 

0  408  5  x  10  1  ±0  007  1  x  10'  1 

0  7922  x  10  1  ±0-0120  x  10' 1 

0  037 6  x  10  2  ±0  001  5  x  10  2 
00069  x  I0'1  ±00004  x  10  1 

00109x10  '±00007x10  1 

unavailability  due  to  load  sharing  and  to  wear,  and  the  corresponding 
decreases  due  to  maintenance,  are  apparent  from  Tables  3  and  4. 

Ten-component  system 

To  illustrate  the  capabilities  of  the  inhomogeneous  Markov  Monte 
Carlo  formalism  we  consider  next  the  ten-component  system  described 


Markov  Monte  Carlo  methods 


2*)1 


by  the  fault  tree  of  Fig.  5.  The  system  has  been  analysed  previously1  2 
using  the  constant  failure  and  repair  rate  data  given  in  Table  5.  The 
unreliability  and  unavailability  results  are  given  respectively  in  Tables 
6  and  7.  In  the  unreliability  calculations,  repair  is  allowed  on  redundant 
components  until  system  failure  occurs. 

Model  (1)  is  a  reference  calculation  with  independent  revealed  failures 
and  using  the  time-independent  failure  and  repair  rate  data  of  Table  5. 
In  this  and  the  succeeding  calculations  the  design  life  is  1 000  hr.  Since 
no  wear  is  present,  maintenance  has  no  effect  on  model  (I). 

In  model  (2)  wear  effects  are  added  to  components  one  through  three 
by  representing  them  with  eqn  (10).  In  this  /10  is  determined  from  Table  5 
and  the  wear  is  characterized  by  0,  =  10  HI  5  h  and  mi  =  2-5.  Components 
one  through  three  constitute  a  2/3  subsystem,  and  we  assume  load 
sharing  by  reducing  0,  of  the  operating  components  by  A0  -  3000  h  for 
each  failed  component.  For  the  calculations  with  maintenance,  as-good- 
as-new  preventive  maintenance  is  performed  at  100-h  intervals  on  a 
staggered  basis  (eg.  at  100,400, . . .  for  component  (I)  and  at  200, 500 _ 


F.  F.  Ijewts.  Tu  Zhuguo 


294 


TABLE  5 

Data  for  Example  Problem 


1 

Group 

kjur'h- 

') 

1 

1 

0-26 

0  042 

2 

1 

0  26 

0  042 

3 

1 

0-26 

0  042 

4 

2 

3-5 

0  17 

5 

2 

3  5 

0  17 

6 

2 

3  5 

017 

7 

3 

0  5 

0 

8 

3 

0  5 

0 

9 

4 

OK 

0 

10 

4 

0  K 

0 

for  component  (2)  and  at  300.600,  for  component  (3)).  Components 
four  through  ten  are  treated  is  in  model  (1).  The  results  indicate  both 
the  increase  in  unavailability  due  to  the  wear  effects  and  the  mitigation 
of  wear  by  preventive  maintenance. 

In  model  (3)  components  one  through  three  and  seven  through  ten 
are  treated  as  in  model  (1).  Components  four  through  six  are  taken  to 
be  unrevealed  failures  by  setting  fj,  =  0.  Components  three  through  six 
also  constitute  a  2,3  subsystem.  For  the  calculations  with  maintenance 
the  same  100-h  interval  staggered  schedule  is  used  as  described  lor 
components  one  through  three  in  model  (2).  Tables  6  and  7  indicate  the 
sharp  loss  in  reliability  when  unrevealed  failures  are  present,  along  with 
the  extent  of  the  loss  mitigation  due  to  the  staggered  test  and  repair 
schedule. 

In  model  (4)  the  capability  of  combining  revealed  and  unrevealed 
failures  as  well  as  wear  and  preventive  maintenance  in  a  single  calculation 
is  illustrated.  In  these  simulations  components  one  through  three  are 
treated  as  in  model  (2),  components  four  through  six  as  in  model  (3) 


TABLE  6 

Unreliability  for  Ten-component  System 


Model 

No  maintenance 

Maintenance 

(1) 

0  4394  x  10  4  ±  0  0046  x  10  4 

0  4394  x  10  4  ±  0  0046  x  10  4 

(2) 

0  5367  x  10  4  ±  0  0078  x  10  4 

0  438  7  x  10  ‘±0  0048  x  10  4 

(3) 

0  349  4  x  10  1  ±0  004  2  x  10  2 

00180  x  10  ~ 2  ±  0  001  4  x  10  2 

(4) 

0  408  3  x  10  2  ±  0005  5  x  10  2 

0  037  3  x  10  2  ±0001  5  x  10  2 

Markov  Monte  Carlo  met  hods 


295 


TABLE  7 


Unavailability  for  Ten-component  System 


Model 

No  maintenance 

Maintenance 

(1) 

0  1436  x  10  *±00053  x  10  * 

0  143  6  x  10  *  ±0  005  3  x  10  * 

(2) 

0-238  3  x  10  *±00120  x  10  * 

0  1522  x  10  *±0  006  1  x  10  * 

(3) 

0  233  9  x  10  2  +  0  003  6  x  10  2 

0  013  3  x  10  2  ±0  ()0I  2  x  10  2 

(4) 

fi  279 3  x  10" 2  ±  0  0046  x  10  2 

0  0194  x  10  2  ±0  001  3  x  10  2 

and  the  remaining  four  components  as  in  model  (1).  Tables  6  and  7 
illustrate  the  synergetic  effects  of  the  combined  failure  and  repair  models 
on  the  system  behavior. 


6  DISCUSSION 

In  the  foregoing  sections  the  capabilities  of  inhomogeneous  Markov 
Monte  Carlo  methods  are  demonstrated.  They  allow  wear  and  preventive 
maintenance  to  be  modeled  within  the  simulation  of  large  systems. 
Moreover,  a  limited  class  of  repair  models  may  also  be  included  for 
both  revealed  and  unrevealed  failures.  In  our  illustrations  we  have  not 
included  examples  of  standby  or  shared  repair  crew  dependencies,  but 
these  also  are  easily  included  within  the  inhomogeneous  Markov 
framework.  Likewise,  the  method  is  easily  extended  to  account  for  fixed 
component  downtimes  for  testing  and  repair  of  unrevealed  failures  as 
well  as  for  imperfect  repair. 

A  more  challenging  task  is  the  generalization  of  the  methods  to 
include  non-Markov  processes.  Two  of  the  more  important  of  these  are 
the  as-good-as-new  repair  of  revealed  failures  and  the  as-good-as-old 
repair  of  unreveaied  failures,  both  illustrated  in  Figs  2  and  3.  Inclusion 
of  these  phenomena  as  well  as  the  use  of  more  realistic  cumulative 
damage  models  for  wear  are  the  obvious  next  step  in  the  development 
of  Monte  Carlo  simulation  of  system  reliability. 


ACKNOWLEDGMENT 

This  work  was  supported  in  part  by  the  US  Air  Force  Office  of  Scientific 
Research,  Contract  AFOSR-84-0340. 


296 


£.  E.  Lewis,  Tu  Zhuguo 


REFERENCES 

1 .  Lewis,  E.  E.  and  Boehm,  F,  Monte  Carlo  simulation  of  Markov  unreliability 
models,  Nucl  Engng  Des .,  77  (1984),  pp.  49-62. 

2.  Tu,  Z.  and  Lewis,  E.  E.  Component  models  in  Markov  Monte  Carlo 
simulation,  Retiab.  Engng ,  13(1985),  pp. 45-6). 

3.  Larson,  H.  S.  and  Shubert,  B.  O.  Probabilistic  Models  in  Engineering  Science, 
Vol.  II,  Wiley,  Njw  York.  1979. 

4.  Gumbel,  J.  E.  Statistics  of  Extremes,  Columbia  University  Press,  New  York, 
1958. 

5.  Lewis,  E.  E.  Introduction  to  Reliability  Engineering,  Chapter  8,  Wiley.  New 
York  (in  press). 


